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Abstract: 

The probability leakage of model M with respect to evidence E is defined. 
Probability leakage is a kind of model error. It occurs when M implies that 
events y, which are impossible given E, have positive probability. Leakage 
does not imply model falsification. Models with probability leakage cannot be 
calibrated empirically. Regression models, which are ubiquitous in statistical 
practice, often evince probability leakage. 
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1 Introduction 

We take an objectivist Bayesian view of probability, in the school of Jaynes et 
al. [8j HI [16j [19] . In this view, probabilities are conditional on evidence E. For 
example, this implies for some event R, we could have Pr(R | Ei) 7^ Pr(i?| E 2 ) 
for two different sets of evidence E. For example, if R = "A five shows" , then 
if Ei = "This is a six-sided object with just one side labled 'five' which will be 
tossed" and E 2 is the same as Ei except substituting "just three sides", then 
Pr(R|Ei) = 1/6 and Pr(R|E 2 ) = 3/6. In other words, Pr(R) is undefined 
without conditioning evidence. 

A model M is sufficient evidence to define a probability, and of course 
different models can and do give different probabilities to the same events. 
If evidence E suggests that some values of an observable are impossible, yet 
M gives positive probability to these events, M is said to evince probability 
leakage with respect to E. 

This has implications for the falsifiability philosophy of [121 which has 
led a strange existence in statistics, with many misapprehensions appearing; 
e.g. O [15], 12]. This criterion is clarified with respect to probability leakage. 
It is difficult to falsify probability models. 

We also take a Bayesian predictive stance, a view which says that all 
parameters are a nuisance: see inter alia [5] El QUI [EE] ■ This approach allows 
us to investigate how probability leakage bears on calibration. Calibration, 
defined below, is how closely a model's predictions "match" actual event; e.g. 

0- 

Since regression is the most-used statistical model, an example is given 
which shows how badly these models fare even when all standard diagnostic 
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measures indicate good performance. 



2 Probability Leakage 
2.1 Definition 

Evidence is gathered or assumed which implies that a model M represents 
uncertainty in some observable y, perhaps conditional on explanatory ob- 
servable variables x and indexed on parameters 9. Data are collected: let 
z = (i/oid, %oid) be its label. Ordinary (and inordinate) interest settles on 
(functions of) 

p(9\z, M) ex p(y old \x old , 6, M)p(9\x old , M). (1) 

M is understood to include the evidence that gives the prior, p(9\x id, M). 
The difficulty is that statements like (IT) (or functions of it) cannot be veri- 
fied. That is, we never know whether (Tl) says something useful or nonsen- 
sical about the world. Whatever certainty we have in 9 after seeing data 
(and assuming M true) tells us nothing directly about y. Indeed, even as 
p(9\z n ,M) — » S(t) (a delta function) as our sample increases, we still do not 
know with certainty the value of future observables y (given x, z, and M; 
and supposing that M is itself not a degenerate distribution). 

Because we assume the truth of M, the following result holds: 

p{y\x, z, M) = z, 9 U M)p(9 l \z, M), (2) 

e 

with the integral replacing the sum if necessary. This result does not just 
hold, but it is the logical implication of our previous assumptions; that is, 
if our assumptions are true, then (|2]) must be so. Even if sole interest is 
in the posterior (jl]), equation ^ is still implied. That is to say, given our 
assumptions, ^ is deduced. This logical truth has consequences. 

The first is the obvious conclusion that Q does not describe z (nor does 
(|2])). All that we want to know about z is in z itself. What remains uncertain 
are unknown (usually future) values of y. The question is how well does M 
describe uncertainty in these y? Equation (|2| is observable; or, rather, it 
can be turned into statements which are about observables. It says that, 
given M, the old data, and perhaps a new value of x, the value of y will take 
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certain values with the calculated probabilities. Suddenly the world of model 
verification becomes a possibility (and is explored below). 

Now suppose we know, via some evidence E, that y cannot take values 
outside some set or interval, such as (y a ,yb)- This evidence implies Pr(y < 
y a \ E) = Pr(y > E) = 0. But if for some value of x (or none is x is null), 
that Pr(y < y a \x,z,M) > or that Pr(y > y b \x,z,M) > 0, then we have a 
probability leakage; at least, with respect to E. The probabilities from (J5]) 
are still true, but with respect to M, z, and x. They are not true with respect 
to E if there is leakage. 

This probability leakage is error, but only if we accept E as true. Leakage 
is a number between (the ideal) and 1 (the model has no overlap with the 
reality described by E). An example of y with known limits is the GPA of 
a college student, known to be at some institution strictly between and 4. 
These limits form our E. 

It's best not to express too rigorous a concern about "leakage sets," how- 
ever, as the following example shows. 

Suppose y is the air temperature as measured by a digital thermometer 
capable of tenth-of-a degree precision. This device, like all tangible devices, 
will have an upper and lower limit. It will also because of its resolution give 
measurements belonging to a finite, discrete set. This information forms our 
E. So not only will the probability (conditional on E) for extreme events be 
0, it will also be for all those measurements which cannot register on the 
device (such as the gaps between the tenths of a degree). 

Now, if we were to use any continuous probability model, such as the 
normal, to represent uncertainty in this y, we would find that the probability 
for the measurable values (actually observable on the device) to be 0. The 
probability leakage of the model with respect to E is complete; it is 1, as high 
as is possible. Since all real-world measurements known to us are discrete 
and finite, yet we so often use continuous probability models to represent our 
uncertainty in these observables, we must turn a blind eye towards leakage 
of the kind just noted. 

To be clear, whenever E implies y is discrete, yet M gives a continuous 
representation of y, the probability leakage is 1. Of course, E could be 
modified so that it admits of the continuity of y, but is it still true that the 
probability of actual observable events (in real life applications, with respect 
to any continuous M) is 0. This, of course, is an age-old problem. 

So to be interesting we must suppose that y, if quantified by a continuous 
probability model, also lives on the continuum. We can still acknowledge (in 
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our E) upper and lower limits to y, as in the GPA example, where probability 
leakage is easy to calculate. 

2.2 Model Falsification 

The term falsified is often tossed about, but in a strange and loose way. A 
rigorous definition is this: if a M says that a certain event cannot happen, 
then if that event happens the model M is falsified. That is, to be falsified 
M must say that an event is impossible: not unlikely, but impossible. If M 
implies that some event is merely unlikely, no matter how small this proba- 
bility, if this event obtains M is not falsified. If M implies that the probability 
of some event is e > then if this event happens, M is not falsified period. 
There is thus no escape in the phrase "Practically impossible," which has 
the same epistemic properties as "practically a virgin." See |T7] for how the 
former phrase can be turned into mathematics. 

Probability leakage does not necessarily falsify M. If there is incomplete 
probability leakage, M says certain events have probabilities greater than 0, 
events which E has says are impossible (have probabilities equal to 0). If 
E is true, as we assume it is, then the events M said are possible cannot 
happen. But to have falsification of M, we need the opposite: M had to say 
that events which obtained were impossible. 

Falsification is when events which M says have probability obtain. This 
will be the case each time M is a continuous probability distribution and y 
are real-world, i.e. measured, events; for we have seen that the probability of 
y taking any measurable value when M is continuous is 0, yet ("real world") 
E tells us that the probabilities of the y in some discrete set are all greater 
than (eventually y takes some measurable value). 

Replace the continuous M with a discrete M and model falsification be- 
comes difficult. Suppose M is a Poisson distribution and y is a count with a 
known (via some E) upper limit. M says that the probability of all counts 
greater than this upper limit have probabilities greater than 0. But since (if 
E is true) none of these y will ever obtain, then M will never be falsified. 

Box gave us an aphorism which has been corrupted to (in the oral tra- 
dition; see [3] for a print version), "All models are wrong." We can see that 
this is false: all models are not wrong, i.e. not all are falsified. They are only 
wrong, i.e. falsified, if they have complete probability leakage. 
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2.3 Calibration 



Calibration [7] has three components. Let the empirical distribution of y 
be represented as Q, a distribution we assume is deduced from E, and let 
the predictive distribution Q implied by M be P. In short-hand notation, 
calibration is defined with respect to probability 

n L — ' 

exceedance 

^QT 1 ° Pi(v) ^ v; 

and marginally 

n ^— ' n ' 

If M with respect to E evinces probability leakage, model M cannot be 
calibrated empirically. This is easily proved. In order for M to be calibrated 
in probability, the frequency of observed ys for which M says that the proba- 
bility of y or less is p must go to p as the sample increases. If there is leakage, 
there will be values of y (call them y') such that < P(y r ) < 1, but which 
are impossible under E, and which will give Q(y') = or Q(y') = 1, making 
probability calibration impossible. 

In order for M to be calibrated in exceedance, each value of y must be as 
probable under P as Q. If there is leakage, there will be values of y (again 
y') under E for which < P(y') < 1 but Q^ 1 o P(y') = or 1, making 
exceedance calibration impossible. 

In order for M to be calibrated marginally, its marginal distribution must 
overlap the empirical distribution. But if there is probability leakage, there 
will be impossible values of y (again y') for which - ~Y^Pi(y) — > p for which 
Qiiu') = or 1, making marginal calibration impossible. 

As [71 E] and others show, if M is to be evaluated by a strictly proper 
scoring rule, the lack of calibration guarantees that better models than M 
exist. 

3 Example 

Statistics as she is practiced — not as we picture her in theoretical perfection- 
is rife with models exhibiting substantial probability leakage. This will be- 
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come obvious from taking just one example, not before published. It well to 
point out that the results of most statistical analyses are not published; they 
remain in private hands and are used in decision making everywhere. 

Regression is ubiquitous. The regression model M assumes that y is 
continuous and that uncertainty in y, given some explanatory variables x, is 
quantified by a normal distribution the parameters of which are represented, 
usually tacitly, by "flat" (improper) priors. This M logically implies a T- 
distribution for (|2]), see [I]. This M has the advantage of mimicking standard 
frequentist results. 

I want to emphasize that I do not justify this model for this data; better 
ones certainly exist. I only claim that regression is often used on data just 
like this. The purpose here is only to show how horribly wrong a model can 
go. 

The data is from one of two call center help lines: y is a measure of 
abandonment, with larger numbers indicating more abandoned calls. It is 
impossible that y be less than 0; but it can equal 0. It has obvious no upper 
limit. It ranged from 0.08 to 14.1. It was to be explained by (the xs) the 
number of calls answered, location, and number of absentees (the number of 
people who were scheduled to work but did not show). It was expected that 
greater calls answered would lead to a higher abandonment; it was assumed 
that the locations differed only in the behavior which would give different 
abandonments; and it was expected that higher absentees would lead to 
higher abandonment. Calls answered could not be less than 0: they ranged 
from 110 to 2,995. Absentees could not be less than 0: they ranged from 1 
to 14. There were 52 samples from each location (total 104). 

Frequentist model diagnostics gave p- values of le-5 for absentees, 0.004 for 
calls answered, and le-5 for the location difference. The point estimates were 
of the size and in the directions expected. Bayesian posteriors on the model 
parameters showed that each was safely different than (with probabilities at 
least 0.999). Visual examination of the "residuals" showed nothing untoward. 
All in all, a very standard regression which performed to expectations and 
which resulted in a satisfied client. 

But the model is poor for all that because of massive probability leakage. 
Our E tells us that y cannot be less than 0. But employing equation (J2j) with 
new values of calls answered and absentees set at 1,200 and 5 (the sample 
medians) respectively gives Figure 1. The logical implication of M is that, 
for these values of x, there is about a 38% chance for values of y less than 
at Location A. Even for Location B, there is still a small chance (about 2%) 
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for values of y less than 0. 

If instead we took the minimum observed values of each x (not pictured), 
then the probability leakage for Location A is a whopping 92%, and for 
Location B it is 50%. These are unacceptably large errors. 

In any problem where x is null we can, after observing z, calculate the 
amount of probability leakage. Otherwise, the amount, for any given M, E 
and z, depends on what values of x are expected. 

For example, if we change the M above to a null x — i.e. the model assumes 
only that our uncertainty in y is characterized by a normal distribution — 
then the probability leakage is the probability, given M and z, of y less than 
0; which in this case is just north of 10%. A substantial error. 

But if we keep the xs as before, the probability leakage changes with 
the values of x. This opens the possibility of modeling our uncertainty in 
probability leakage for a given M, y, x, and z. It requires, of course, a new 
model for the leakage as function of the x, with all that that entails. This is 
left for further research. For now, we only rely on the supposition that the 
range of xs we have seen before are possible, even likely, to realize in new 
data. 

4 Conclusion 

Probability leakage may be difficult or impossible to compute when limits 
aren't pertinent, yet there still may be E such that the probability of certain 
values of y judged to be extremely unlikely are not interesting. An example 
is when y represents a financial gain or loss. No obvious limits exist. We 
assume t/'s continuity (via M and E). Yet we might, given some evidence of 
the market, say y is so unlikely outside of certain limits that these limits 
are "practically impossible." We have already seen that this phrase is trou- 
blesome, but if we make employ it our situation amounts to assuming strict 
limits as before. 

We might be able to soften the concept of leakage in the absence of limits. 
An expert might, via probability elicitation, sketch a density (or distribution) 
for y (possibly given some x). If the distribution ^ implied by M is "far" 
from this elicitation, then either the expert is mistaken or M is. What is 
far must be quantified: a possible candidate (among many) is the Kullback- 
Leibler distance; see for example [5]. This possibility is not explored here. 



An objection to the predictive approach is that interest is solely in the 
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posterior (TTj) ; in whether, say, the hypothesis (Hg) that absentees had an 
effect on abandonment. But notice that the posterior does not say with what 
probability absentees had an effect: it instead says z/M is true and given z, 
the probability that the parameter associated with absentees is this-and-such. 
If M is not true (it is falsified), then the posterior has no bearing on H#. In 
any case, the posterior does not give us Pr(H# \z), it gives Pr(Hg | M, z). We 
cannot answer whether Hq is likely without referencing M, and M implies 

©■ 

Probability leakage is far from the last word in model validation. It does 
not answer many questions about the usefulness of models. Nor can it always 
in isolation tell whether a model is likely true or false: see the book by [TS] 
for an overview of model validation. Leakage can, however, give a strong 
indication of model worth. 
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Abandonment 



Figure 1: The posterior prediction distribution for a normal model with no 
regressors (dashed-dotted line); for the full regression model at Location A 
(solid line) and for Location B (dashed line) with the regressors set at their 
observed medians. The vertical line indicates abandonments of 0, values 
below which are impossible with respect to E. Substantial probability is given 
to impossible values under M. 
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